Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-6403): Add CSOT support to client bulk write #4261

Merged
merged 6 commits into from
Oct 14, 2024

Conversation

baileympearson
Copy link
Contributor

@baileympearson baileympearson commented Oct 1, 2024

Description

What is changing?

This PR adds support for CSOT to the client bulk write API.

Is there new documentation needed for these changes?

What is the motivation for this change?

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@baileympearson baileympearson changed the base branch from main to NODE-6090 October 1, 2024 19:21
@baileympearson baileympearson changed the title feat(NODE-6090): Implement CSOT logic for connection checkout and ser… feat(NODE-6090): Add CSOT support to client bulk write Oct 1, 2024
@baileympearson baileympearson force-pushed the csot-client-bulk-writ branch 4 times, most recently from d6b99ed to 048868c Compare October 7, 2024 19:36
@@ -116,6 +116,7 @@ export function onData(
emitter.off('data', eventHandler);
emitter.off('error', errorHandler);
finished = true;
timeoutForSocketRead?.clear();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the cause of the hanging test suites - some CSOT tests spawn huge timeouts that we never clean up.

yield { parse: () => writeErrorsReply };
} else {
yield (await realIterator.next()).value;
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change on onData, we need to be sure we call finally in the mock too.

@@ -16,7 +16,7 @@ describe('Collection Management and Db Management', function () {
});

it('returns a collection object after calling createCollection', async function () {
const collection = await db.createCollection('collection');
const collection = await db.createCollection(new ObjectId().toHexString());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test suite twice and this failed the second time - I can remove this if we want but this guarantees we never create the same collection twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should be more responsible about their side effects, like: returns true after calling dropDatabase is dropping the common default 'test' db. This is fine to leave in, but we could do a bit more improvement in the before/after dept. up to u!


client = this.configuration.newClient({ timeoutMS: 2000, monitorCommands: true });
client.on('commandStarted', filterForCommands('bulkWrite', writes));
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prose test incorporates changes from mongodb/specifications#1672.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify drivers may provide maxWriteBatchSize

where are we providing that value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah in the makeMultiBatchWrite helper, got it

@baileympearson baileympearson changed the title feat(NODE-6090): Add CSOT support to client bulk write feat(NODE-6403): Add CSOT support to client bulk write Oct 9, 2024
@baileympearson baileympearson force-pushed the csot-client-bulk-writ branch 2 times, most recently from b5bf67e to 79aa064 Compare October 9, 2024 14:53
@nbbeeken nbbeeken self-assigned this Oct 10, 2024
@nbbeeken nbbeeken added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Oct 10, 2024
src/timeout.ts Outdated
@@ -171,7 +171,7 @@ function isCSOTTimeoutContextOptions(v: unknown): v is CSOTTimeoutContextOptions

/** @internal */
export abstract class TimeoutContext {
static create(options: TimeoutContextOptions): TimeoutContext {
static create(options: Partial<TimeoutContextOptions>): TimeoutContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we always needed to make sure there was a serverSelectionTimeoutMS, shouldn't that continue to be required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I believe that was an oversight. Why was it required in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it always necessary to provide because serverSelection is the max bound of that step? @W-A-James may be more familiar than I

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I think we'll want to create timeout contexts that apply to subparts of the driver. For example, authentication and monitoring execute commands directly on the connection. I envisioned us creating timeout with TimeoutContext.create() and passing the result in directly.

These places do not have a server selection timeout provided and do not need one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't how we use it now can we keep making sure we pass in this value to not make any mistakes for the current use cases? making it Partial later when there is no need for it shouldn't be too much effort, but we're still WIP on the CSOT use case that does need it.

src/utils.ts Outdated Show resolved Hide resolved
src/utils.ts Outdated Show resolved Hide resolved
src/operations/client_bulk_write/executor.ts Show resolved Hide resolved

client = this.configuration.newClient({ timeoutMS: 2000, monitorCommands: true });
client.on('commandStarted', filterForCommands('bulkWrite', writes));
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify drivers may provide maxWriteBatchSize

where are we providing that value?

});
});

it('timeoutMS is used as the timeout for the bulk write', metadata, async function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it "throws MongoOperationTimeoutError when the server does not reply in time"

Not necessarily exactly what I've written but the expected API is thrown error vs "using X as the timeout" is the implementation right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the expectation is that the operation takes no longer than timeoutMS. That we throw a timeout error is just a manifestation of this in our public API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That manifestation (throwing) is the expected behavior right? if the expectation is that it takes no longer than timeoutMS then shouldn't there be assertions that measure the distance between a start and end time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The expected behavior is that the operation times out within timeoutMS. The timeout error is how we communicate this to users but if that's all we consider the "semver API" - what is stopping us from immediately throwing a timeout error before executing any commands?

I've adjusted the tests to measure the execution time of each operation and assert it is within a range of timeoutMS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout error is how we communicate this to users

Isn't MongoOperationTimeoutError still the expected error? I thought these tests checked that but now it is MongoError.

what is stopping us from immediately throwing a timeout error before executing any commands?

Yea I agree, I think this morning we realized we don't really have coverage for meeting the deadline in general because the spec tests only expect error, but we should think about introducing time-taken assertions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed them to MongoError because while timeout errors are the actual error thrown, that's not what this test is asserting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have tests that separately cover client.bulkWrite throws the expected timeout error? Are there CSOT spec tests for client.bulkWrite? that would suffice

Any other error thrown within the expected deadline isn't capturing the expected behavior, how we communicate the timeout matters if one is to write code against that behavior.

test/integration/crud/client_bulk_write.test.ts Outdated Show resolved Hide resolved

client = this.configuration.newClient({ timeoutMS: 2000, monitorCommands: true });
client.on('commandStarted', filterForCommands('bulkWrite', writes));
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah in the makeMultiBatchWrite helper, got it

test/integration/crud/client_bulk_write.test.ts Outdated Show resolved Hide resolved
});
});

it('timeoutMS is used as the timeout for the bulk write', metadata, async function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That manifestation (throwing) is the expected behavior right? if the expectation is that it takes no longer than timeoutMS then shouldn't there be assertions that measure the distance between a start and end time?

src/utils.ts Outdated Show resolved Hide resolved
src/utils.ts Outdated Show resolved Hide resolved
});
});

it('timeoutMS is used as the timeout for the bulk write', metadata, async function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout error is how we communicate this to users

Isn't MongoOperationTimeoutError still the expected error? I thought these tests checked that but now it is MongoError.

what is stopping us from immediately throwing a timeout error before executing any commands?

Yea I agree, I think this morning we realized we don't really have coverage for meeting the deadline in general because the spec tests only expect error, but we should think about introducing time-taken assertions.

src/timeout.ts Outdated
@@ -171,7 +171,7 @@ function isCSOTTimeoutContextOptions(v: unknown): v is CSOTTimeoutContextOptions

/** @internal */
export abstract class TimeoutContext {
static create(options: TimeoutContextOptions): TimeoutContext {
static create(options: Partial<TimeoutContextOptions>): TimeoutContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't how we use it now can we keep making sure we pass in this value to not make any mistakes for the current use cases? making it Partial later when there is no need for it shouldn't be too much effort, but we're still WIP on the CSOT use case that does need it.

});
});

it('timeoutMS is used as the timeout for the bulk write', metadata, async function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have tests that separately cover client.bulkWrite throws the expected timeout error? Are there CSOT spec tests for client.bulkWrite? that would suffice

Any other error thrown within the expected deadline isn't capturing the expected behavior, how we communicate the timeout matters if one is to write code against that behavior.

@nbbeeken nbbeeken removed the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Oct 11, 2024
@nbbeeken nbbeeken added the Team Review Needs review from team label Oct 11, 2024
@nbbeeken nbbeeken requested review from durran and W-A-James October 11, 2024 17:23
@nbbeeken nbbeeken merged commit 7df1a70 into NODE-6090 Oct 14, 2024
20 of 27 checks passed
@nbbeeken nbbeeken deleted the csot-client-bulk-writ branch October 14, 2024 15:38
nbbeeken pushed a commit that referenced this pull request Oct 14, 2024
baileympearson added a commit that referenced this pull request Oct 21, 2024
baileympearson added a commit that referenced this pull request Oct 21, 2024
nbbeeken pushed a commit that referenced this pull request Nov 1, 2024
nbbeeken pushed a commit that referenced this pull request Nov 5, 2024
dariakp pushed a commit that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants